Reinforcement Learning for Humanoid Robotics
نویسندگان
چکیده
Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot [3]. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade [10] is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.
منابع مشابه
Episodic Reinforcement Learning Control Approach for Biped Walking
This paper presents a hybrid dynamic control approach to the realisation of humanoid biped robotic walk, focusing on the policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: a conventional computed torque controller and an episodic reinforcement learning controller. The reinforcement learning part inclu...
متن کاملRobo-Erectus: a low-cost autonomous humanoid soccer robot
The humanoid soccer robot league is a new international initiative to foster robotics and AI technologies using soccer games [1]. This paper provides a brief description of a low-cost autonomous humanoid soccer robot called Robo-Erectus (RE), which has been developed in the Center for Advanced Robotics and Intelligent Control (ARICC) at Singapore Polytechnic since 2001. To develop a low-cost hu...
متن کاملPolicy Gradient Methods for Robot Control
Reinforcement learning offers the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be co...
متن کاملReinforcement learning control algorithm for humanoid robot walking
The integrated dynamic control of humanoid locomotion mechanisms based on the spatial dynamic model of humanoid mechanism is presented in this paper. The control scheme was synthesized using the centralized model with proposed structure of dynamic controller that involves two feedback loops: position-velocity feedback of the robotic mechanism joints and reinforcement learning feedback around Ze...
متن کاملLearning to Acquire Whole-Body Humanoid Center of Mass Movements to Achieve Dynamic Tasks
This paper presents a novel approach for acquiring dynamic whole-body movements on humanoid robots focused on learning a control policy for the center of mass (CoM). In our approach, we combine both a model-based CoM controller and a model-free reinforcement learning (RL) method to acquire dynamic whole-body movements in humanoid robots. (i) To cope with high dimensionality, we use a model-base...
متن کامل